# README

## Overview

This dataset provides bibliographic and methodological information on a random sample of academic articles involving propensity score matching (PSM). Each row corresponds to a single article and contains information such as the article’s DOI, title, authors, publication year, and a series of binary or categorical indicators describing the methods used or reported within the study. These indicators focus largely on PSM, balance reporting practices, and other causal inference or evaluation techniques.

The dataset is intended to help researchers quickly identify methodological characteristics of this set of studies. 

## File Format and Structure

- **File Format:** The data is provided in a tab-separated values (TSV) file, with a header row followed by one row per article.
- **Character Encoding:** UTF-8

### Columns Description

1. **full_DOI**:  
   - **Type:** String  
   - **Description:** The article’s full Digital Object Identifier (DOI).

2. **title**:  
   - **Type:** String  
   - **Description:** The full title of the article.

3. **author**:  
   - **Type:** String  
   - **Description:** The authors of the article as listed in the source.

4. **reduced_DOI**:  
   - **Type:** String or Numeric (transformed identifier)  
   - **Description:** A simplified or shortened DOI, possibly used as an internal reference ID.

5. **pub_year**:  
   - **Type:** Integer  
   - **Description:** Year the article was published.

6. **bal_report_prematch**, **bal_noreport_prematch**, **no_bal_prematch**:  
   - **Type:** Binary (0/1 or NA)  
   - **Description:** Indicators related to whether the study reports or does not report balance checks before matching.  
   - **Interpretation:**  
     - `bal_report_prematch`: Balance reported pre-match.  
     - `bal_noreport_prematch`: Balance not reported pre-match.  
     - `no_bal_prematch`: Not applicable or no pre-match balance details provided.

7. **bal_report_postmatch**, **bal_noreport_postmatch**, **no_bal_postmatch**:  
   - **Type:** Binary (0/1 or NA)  
   - **Description:** Indicators related to reporting or not reporting balance checks after performing a matching procedure.  
   - **Interpretation:**  
     - `bal_report_postmatch`: Balance reported post-match.  
     - `bal_noreport_postmatch`: Balance not reported post-match.  
     - `no_bal_postmatch`: Not applicable or no post-match balance details provided.

8. **psm_no_details**, **psm_simple**, **psm_w_exact**, **psm_iterate**:  
   - **Type:** Binary (0/1 or NA)  
   - **Description:** Indicators of how propensity score matching was conducted or reported.  
   - **Interpretation:**  
     - `psm_no_details`: PSM mentioned but no procedural details provided.  
     - `psm_simple`: Simple propensity score matching reported (e.g., one-to-one matching).  
     - `psm_w_exact`: PSM with exact matching on some covariates reported.  
     - `psm_iterate`: Iterative or refined PSM approaches reported.

9. **ps_match_cov**:  
   - **Type:** Binary (0/1 or NA)  
   - **Description:** Indicates whether the study reports the covariates used in propensity score modeling.

10. **psm_algorithm**:  
    - **Type:** String or Categorical  
    - **Description:** Indicates the type of PSM algorithm used (e.g., nearest-neighbor, kernel matching, etc.).

11. **simulation_data**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Whether the article used simulated data in its analysis.

12. **psm_dif_in_dif**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if the study combined difference-in-differences (DID) with PSM.

13. **ps_strat**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if propensity score stratification was used.

14. **psm_as_robustness**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Whether PSM was applied as a robustness check rather than the primary method.

15. **ps_remove_only**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if PSM was used solely to remove observations outside common support, without further adjustments.

16. **ps_performed**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if any form of PSM was performed.

17. **ps_reg**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if propensity score regression adjustment was performed.

18. **psm_missing**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Whether the study addressed or mentioned missing data issues specifically in the context of PSM.

19. **ps_sample**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if sampling based on propensity scores (e.g., restricting analysis to certain propensity score strata) was used.

20. **ps_weighting**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if inverse probability weighting or weighting by propensity scores was used.

21. **ps_other**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if any other propensity score methodology (not categorized above) was employed.

22. **methods_article**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if the article is methods-focused, i.e., primarily about methodology rather than an applied empirical study.

23. **response_comment_article**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if the article is a response, comment, or discussion piece rather than a standard research article.

24. **english**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Indicates if the article is written in English.

25. **initial_sample**:  
    - **Type:** Binary (0/1 or NA)  
    - **Description:** Whether the initial sample (prior to any methodological exclusion criteria) is discussed or provided.

26. **other_notes**:  
    - **Type:** String or NA  
    - **Description:** Additional notes or comments about the study or data coding that don’t fit into the other categories.
